Real-Time Regression Analysis of Streaming Clustered Data With Possible Abnormal Data Batches

نویسندگان

چکیده

This article develops an incremental learning algorithm based on quadratic inference function (QIF) to analyze streaming datasets with correlated outcomes such as longitudinal data and clustered data. We propose a renewable QIF (RenewQIF) method within paradigm of estimation inference, in which parameter estimates are recursively renewed current summary statistics historical data, but no use any subject-level raw compare our both offline generalized estimating equations (GEE) approach that process the entire cumulative all together, show theoretically numerically procedure enjoys statistical computational efficiency. also diagnose homogeneity assumption regression coefficients via sequential goodness-of-fit test screening occurrences abnormal batches. implement proposed methodology by expanding existing Spark’s Lambda architecture for operation quality diagnosis. illustrate extensive simulation studies analysis car crash from National Automotive Sampling System-Crashworthiness Data System (NASS CDS). Supplementary materials this available online.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fuzzy Data Envelopment Analysis for Classification of Streaming Data

The classification of fuzzy uncertain data is considered one of the most challenging issues in data analysis. In spite of the significance of fuzzy data in mathematical programming, the development of the analytical methods of fuzzy data is slow. Therefore, the current study proposes a new fuzzy data classification method based on fuzzy data envelopment analysis (DEA) which can handle strea...

متن کامل

Fuzzy Data Envelopment Analysis for Classification of Streaming Data

The classification of fuzzy uncertain data is considered one of the most challenging issues in data analysis. In spite of the significance of fuzzy data in mathematical programming, the development of the analytical methods of fuzzy data is slow. Therefore, the current study proposes a new fuzzy data classification method based on fuzzy data envelopment analysis (DEA) which can handle strea...

متن کامل

Local polynomial regression analysis of clustered data

This paper proposes a classical weighted least squares type of local polynomial smoothing for the analysis of clustered data, with the key idea of using generalised inverses of correlation matrices. The estimator has a simple closed-form expression. Simplicity is achieved also for nonparametric generalised linear models with arbitrary link function via a transformation. Our approach can be char...

متن کامل

Quantile regression with clustered data Paulo

We show that the quantile regression estimator is consistent and asymptotically normal when the error terms are correlated within clusters but independent across clusters. A consistent estimator of the covariance matrix of the asymptotic distribution is provided and we propose a specification test capable of detecting the presence of intra-cluster correlation. A small simulation study illustrat...

متن کامل

Real-Time Streaming Data Delivery over Named Data Networking

Named Data Networking (NDN) is a proposed future Internet architecture that shifts the fundamental abstraction of the network from host-to-host communication to request-response for named, signed data–an information dissemination focused approach. This paper describes a general design for receiver-driven, real-time streaming data (RTSD) applications over the current NDN implementation that aims...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of the American Statistical Association

سال: 2022

ISSN: ['0162-1459', '1537-274X', '2326-6228', '1522-5445']

DOI: https://doi.org/10.1080/01621459.2022.2026778